Multiple Source Phoneme Recognition Aided by Articulatory Features
نویسندگان
چکیده
This paper presents an experiment in speech recognition whereby multiple phoneme recognisers are applied to the same utterance. When these recognisers agree on an hypothesis for the same time interval, that hypothesis is assumed to be correct. When they are in disagreement, fine-grained phonetic features, called articulatory features, recognised from the same speech utterance are used to create an articulatory feature-based phoneme. If the output of either of the phoneme recognisers for that interval matches the articulatory feature-based phoneme, then that phoneme is selected as an hypothesis for the interval. Underspecification of the articulatory feature-based phoneme is implemented if an hypothesis is not found and the matching process is repeated. The results of the experiment show that the final output accuracy is greater than both of the initial two phoneme recognisers.
منابع مشابه
Improving Articulatory Feature and Phoneme Recognition Using Multitask Learning
Speech sounds can be characterized by articulatory features. Articulatory features are typically estimated using a set of multilayer perceptrons (MLPs), i.e., a separate MLP is trained for each articulatory feature. In this paper, we investigate multitask learning (MTL) approach for joint estimation of articulatory features with and without phoneme classification as subtask. Our studies show th...
متن کاملCombining articulatory and acoustic information for speech recognition in noisy and reverberant environments
Robust speech recognition under varying acoustic conditions may be achieved by exploiting multiple sources of information in the speech signal. In addition to an acoustic signal representation, we use an articulatory representation consisting of pseudoarticulatory features as an additional information source. Hybrid ANN/HMM recognizers using either of these representations are evaluated on a co...
متن کاملArticulatory Features for Robust Visual Speech Recognition by Ekaterina Saenko
This thesis explores a novel approach to visual speech modeling. Visual speech, or a sequence of images of the speaker's face, is traditionally viewed as a single stream of contiguous units, each corresponding to a phonetic segment. These units are defined heuristically by mapping several visually similar phonemes to one visual phoneme, sometimes referred to as a viseme. However, experimental e...
متن کاملHuman Feature Extraction the Role of the Articulatory Rhythm
Neuro-physical investigations [1] hint to a new paradigm for feature extraction not used in ASR. This paradigm is based on synchronized brain to brain oscillations, active during speech production and speech perception. This mechanism leads to an evolving theory, the author calls the Unified Theory of Human Speech Processing (UTHSP). The core elements of this theory are the articulatory rhythm ...
متن کاملOn acquiring speech production knowledge from articulatory measurements for phoneme recognition
The paper proposes a general version of a coupled Hidden Markov/Bayesian Network model for performing phoneme recognition on acoustic-articulatory data. The model uses knowledge learned from the articulatory measurements, available for training, for phoneme recognition on the acoustic input. After training on the articulatory data, the model is able to predict 71.5% of the articulatory state se...
متن کامل